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Abstract 

Gaussian channels with memory and with noiseless feedback have been widely studied in the 
information theory literature. However, a coding scheme to achieve the feedback capacity is not available. 

In this paper, a coding scheme is proposed to achieve the feedback capacity for Gaussian channels. The 
coding scheme essentially implements the celebrated Kalman filter algorithm, and is equivalent to an 
estimation system over the same channel without feedback. It reveals that the achievable information rate 
of the feedback communication system can be alternatively given by the decay rate of the Cramer-Rao 
bound of the associated estimation system. Thus, combined with the control theoretic characterizations 
of feedback communication (proposed by Elia), this implies that the fundamental limitations in feedback 
communication, estimation, and control coincide. This leads to a unifying perspective that integrates 
information, estimation, and control. We also establish the optimality of the Kalman filtering in the 
sense of information transmission, a supplement to the optimality of Kalman filtering in the sense 
of information processing proposed by Mitter and Newton. In addition, the proposed coding scheme 
generalizes the Schalkwijk-Kailath codes and reduces the coding complexity and coding delay. The 
construction of the coding scheme amounts to solving a finite-dimensional optimization problem. A 
simplification to the optimal stationary input distribution developed by Yang, Kavcic, and Tatikonda is 
also obtained. The results are verified in a numerical example. 

I. Introduction 

Communication systems in which the transmitters have access to noiseless feedback of channel 
outputs have been widely studied. As one of the most important case, the single-input single-output 
frequency-selective Gaussian channels with feedback have attracted considerable attention; see [1]—[16] 
and references therein for the capacity computation and coding scheme design for these channels. In 
particular, [1], [2] proposed ingenious feedback codes (called the Schalkwijk-Kailath codes, in short 
the SK codes) for additive white Gaussian noise (AWGN) channels, which achieve the asymptotic 
feedback capacity (i.e. the infinite-horizon feedback capacity, denoted C^) and greatly reduce the coding 
complexity and coding delay. [4], [5], [7] presented the extensions of the SK codes to Gaussian feedback 
channels with memory and obtained tight capacity bounds. 

[6] presented a rather general coding structure (called the Cover-Pombra structure, in short the CP 
structure) to achieve the finite-horizon feedback capacity (denoted Ct, where the horizon spans from 
time epoch 0 to time epoch T) for Gaussian channels with memory; however, it involves prohibitive 
computation complexity as the coding length (T + 1) increases. By exploiting the special properties of 
a moving-average Gaussian channel with feedback, [9] discovered the finite rankness of the innovations 
in the CP structure, which reduces the computation complexity. [10] reformulated the CP structure 
along this direction, and obtained an SK-based coding scheme to achieve Ct with reduced computation 
complexity. Also along the line of [9], [15] studied a first-order moving-average Gaussian channel with 
feedback, found the closed-form expression for C^, and obtained an SK-based coding scheme to achieve 

Coo. 


[11] provided a thorough study of feedback capacity; extended the notion of directed information 
proposed in [17] and proved that its supremum is the feedback capacity; reformulated the problem of 
computing C'r as a stochastic control optimization problem; and proposed a dynamic programming 
based solution. This idea was further explored in [12], which uncovered the Markov property of the 
optimal input distributions for Gaussian channels with memory and eventually reduced the finite-horizon 
stochastic control optimization problem to a manageable size. Moreover, under a stationarity conjecture 
that Coo equals the stationary capacity (the maximum information rate over all stationary input distri¬ 
butions, denoted C s ), Coo is given by the solution of a finite dimensional optimization problem. This is 
the first computationally efficient 1 method to calculate C s or C'r for general Gaussian channels. The 
stationary conjecture has been recently confirmed, namely C s = Coo, an d Coo is achievable using a (an 
asymptotically) stationary input distribution [16]. 

[3] proposed a view of regarding the optimal communication over an AWGN channel with feedback 
as a control problem. [13] investigated the problem of fracking unstable sources over a channel and 
introduced the notion of anytime capacity to capture the fundamental limitations in that problem, which 
reveals intimate connections between communication and control and brings new insights to feedback 
communication problems. Furthermore, [14] established the equivalence between feedback communi¬ 
cation and feedback stabilization over Gaussian channels with memory, showed that the achievable 
transmission rate is given by the Bode sensitivity integral of the associated control system, and presented 
an optimization problem based on robust control to compute lower bounds of C s . [14] also extended 
the SK codes to achieve these lower bounds, and the coding schemes have an interpretation of tracking 
unstable sources over Gaussian channels. 

For Gaussian networks with feedback, tight capacity bounds can be found in [14], [18], [19], For 
time-selective fading channels with AWGN and with feedback, an SK-based coding scheme utilizing 
the channel fading information was constructed in [20] to achieve the ergodic capacity. 

As we can see, it remains an open problem to build a coding scheme with reasonable complexity to 
achieve Coo for a Gaussian channel with memory; note that no practical codes have been found based 
on the optimal signalling strategy in [12]. In this paper, we propose a coding scheme for frequency- 
selective Gaussian channels with output feedback. This coding scheme achieves Coo, the asymptotic 
feedback capacity of the channel; utilizes the Kalman filter algorithm; simplifies the coding processes; 
and shortens the coding delay. The optimal coding structure is essentially a finite-dimensional linear time- 
invariant (FDLTI) system, is also an extension of the SK codes, and leads to a further simplification of the 
optimal stationary signalling strategy in [12], The construction of the coding system amounts to solving 
a finite-dimensional optimization problem. Our solution holds for AWGN channels with intersymbol 
interference (ISI) where the ISI is model as a stable and minimum-phase FDLTI system; through the 
equivalence shown in [11], [12], this channel is equivalent to a colored Gaussian channel with rational 
noise power spectrums and without ISI. Note that the rationalness assumption is widely used and not 
too restrictive, since any power spectrum can be arbitrarily approximated by rational ones. 

In deriving our optimal coding design in infinite-horizon, we first present finite-horizon analysis (which 
is closely related to the CP structure) of the feedback communication problem, and then let the horizon 
length tend to infinity and obtain our optimal coding design which achieves Coo • More specifically, in 
our finite-horizon analysis, we establish the necessity of the Kalman filter: The Kalman filter is not 
only a device to provide sufficient statistics (which was shown in [12]), but also a device to ensure 
the power efficiency and to recover the message optimally. This also leads to a refinement of the CP 

'Here we do not mean that their optimization problem is convex. In fact the computation complexity for C/6,t is 0[T + 1), 
and for Cfb,aa the complexity is determined mainly by the channel order, which does not involve prohibitive computation if 
the channel order is not too high. 



structure, applicable for generic Gaussian channels. Additionally, the presence of the Kalman filter in 
our coding scheme reveals the intrinsic connections among feedback communication, estimation, and 
control. In particular, we show that the feedback communication problem over a Gaussian channel is 
essentially an optimal estimation problem, and the achievable rate of the feedback communication system 
is alternatively given by the decay rate of the Cramer-Rao bound (CRB) for the associated estimation 
system. Invoking the Bode sensitivity characterization of the achievable rate [14], we conclude that the 
fundamental limitations in feedback communication, estimation, and control coincide. We then extend 
the horizon to infinity and characterize the steady-state of the feedback communication problem. We 
finally show that our optimal scheme achieves C 0Q . 

We also remark that the necessity of the Kalman filter in the optimal coding scheme is not surprising, 
given various indications of the essential role of Kalman filtering (or minimum mean-squared error 
(MMSE) estimators; or minimum-energy control, its control theory equivalence; or the sum-product 
algorithm, its generalization) in optimal communication designs. See e.g. [12], [14], [21]—[24], The 
study of the Kalman filter in the feedback communication problem along the line of [24] may shed 
important insights on optimal communication problems and is under current investigation. 

One main insight gained in this study is that, the perspective of unifying information, estimation, and 
control, three fundamental concepts, facilitates our development of the optimal feedback communication 
design. Though the connections between any two of the three concepts have been investigated or are under 
investigation, a joint study explicitly addressing all three is not available. Our study provides the first 
example that the connections among the three can be explored and utilized, to the best of our knowledge. 
In addition to helping us to achieve the optimality in the feedback communication problem, this new 
perspective establishes the optimality of the Kalman filtering in the sense of information transmission, 
a supplement to the optimality of Kalman filtering in the sense of information processing proposed 
by Mitter and Newton [24], It also leads to a new formula connecting the mutual information in the 
feedback communication system and MMSE in the associated estimation problem, a supplement to 
a fundamental relation between mutual information and MMSE proposed by Guo, Shamai, and Verdu 
[25]. We anticipate that this new perspective may help us to study more general feedback communication 
problems in future investigations, such as multiuser feedback communications. 

This paper is organized as follows. In Section m we introduce the channel models. The problem 
formulation is given in Section ED followed by the problem solution, i.e. the optimal coding scheme 
and the coding theorem. In Section IIVI we prove the necessity of the Kalman filter in generating the 
optimal feedback. In Section 0 we provide the connections of the feedback communication problem 
to an estimation problem and a control problem, and express the maximum achievable rate in terms 
of estimation theory quantities and control theory quantities. In Section Ivnl we show that our coding 
scheme is capacity-achieving. Section lVIIIl Drovides a numerical example. Finally we conclude the paper 
and discuss future research directions. 

Notations: We represent time indices by subscripts, such as y t . We denote by y T the vector {yo• yi. 
• • •, ut}, and {y t } the sequence We assume that the starting time of all processes is 0, consistent 

with the convention in dynamical systems but different from the information theory literature. We use 
h(X) for the differential entropy of the random variable X. For a random vector y T , we denote its 
covariance matrix as Ky T \ For a stationary process {yt}, we denote its power spectrum as Sy(e j27r0 ). 
We denote T xy {z ) as the transfer function from x to y. We denote “defined to be” as We use 

(. A , B, C, D) to represent system 


Ax t + Bu t 
Cx t + Du t . 


x t +i 

Vt 


( 1 ) 




II. Channel model 


In this section, we briefly describe two Gaussian channel models, namely the colored Gaussian noise 
channel without ISI and white Gaussian noise channel with ISI. 

A. Colored Gaussian noise channel without ISI 

Fig. [I] (a) shows a colored Gaussian noise channel without ISI. At time f, this discrete-time channel 
is described as 

y t = u t + Z t , for t = 0,1, • • • , (2) 

where ut is the channel input, Zt is the channel noise, and yt is the channel output. We make the following 
assumptions: The colored noise {Z t j is the output of a finite-dimensional stable and minimum-phase 
linear time-invariant (LTI) system Z(z) driven by a white Gaussian process {Ag} of zero mean and 
unit variance, and Z(z) is at initial rest. For any block size (i.e. coding length) of (T + 1), we may 
equivalently generate Z 1 by 

Z T = Z t N t , (3) 

where Zt is a (T + 1) x (T + 1) lower-triangular Toeplitz matrix of the impulse response of Z(z). 
We may abuse the notation Z for both Z(z) and Zt if no confusion arises. As a consequence, {Z t } is 
asymptotically stationary. 2 



(a) (b) (c) 

Fig. 1. (a) A colored Gaussian noise channel without ISI. (b) The equivalent ISI channel with AWGN. (c) State-space 

realization of channel T. 

Note that there is no loss of generality in assuming that Z(z ) is stable and minimum-phase (cf. 
Chapter 11, [26]), implying that the initial condition of Z[z) generates no effect on the steady-state. 
Thus we made the initial rest assumption since we mainly focus on the steady-state characterization. 

B. White Gaussian channel with ISI 

The above colored Gaussian channel induces a new channel, namely a white Gaussian channel with 
ISI, under a further assumption that Z(oo) / 0 (i.e. Z is proper but non-strictly proper). More precisely, 
notice that from © and ©, we have 

y T = Z t (Zt 1 u t + N t ), (4) 

2 The difference between a stationarity assumption and an asymptotic stationarity assumption may result from different starting 
points of the process: If starting from t = —oo, {Zt } is stationary ; instead if starting from t = 0 as we are assuming here, {Zt} 
is asymptotically stationary. They result in exactly the same steady-state analysis of the feedback communication problem. 





























which we identify as a stable and minimum-phase ISI channel with AWGN {A^}, see Fig. |H(b). Here 
Z~ 1 (z ) is also at initial rest. For any fixed u T and N T , (a) and (b) generate the same channel output 
y T . 3 Note that Zf 1 is the matrix inverse of Zj, equal to the lower-triangular Toeplitz matrix of impulse 
response of Z~ 1 (z). 

The initial rest assumption on Z _1 can be imposed in practice equivalently by, first driving the initial 
condition of the ISI channel to any desired value (known to the receiver) before a transmission, and 
then removing the response due to that initial condition at the receiver. Such an assumption is also used 
in [11], [12], We further assume for simplicity that Z( oo) = 1; for cases where g := Z{ oo) 1, we 
can normalize Z[z) by scaling it by 1 /g. Hence, Zj is a lower triangular Toeplitz matrix with diagonal 
elements all equal to 1 (and thus is invertible). 

We can then write the minimal state-space representation of Z~ 1 as (F, G. H. 1), where F G M m is 
stable, (F,G) is controllable, (F. H) is observable, and in is the dimension or order of Z~ x . Let us 
denote the channel from u to y in Fig. m» as F, where 

y T := Zf l u T + N T = Z^f. (5) 


The channel F is described in state-space as 


channel F: 


•Sf+l 

yt 


F st + Gut 
H St + ut + Nt, 


( 6 ) 


where so = 0; see Fig. |T] (c). Notice that channel F is not essentially different than the channel from u 
to y, since {;</} and {y 1 } causally determine each other. 

We concentrate on the case m > 1; the case that m is 0 (i.e., F is an AWGN channel) was solved 
in [1], [2], 


III. Problem formulation in steady-state and the solution 

Before formulating the steady-state communication problem, we distinguish among the three scenarios: 
Finite-horizon (i.e. finite coding length), infinite-horizon (i.e. infinite coding length), and steady state. 
Finite-horizon problems often have time-dependent (i.e. time-varying) and horizon-dependent solutions 
(similar to finite-horizon Kalman filtering). The horizon-dependence may be removed in the infinite- 
horizon scenario, and furthermore, the time-dependence may be removed in the steady-state scenario. If 
we find the (stationary, time-invariant) steady-state solution (which by [16] is also the infinite-horizon 
solution), we can truncate it and employ the truncation to the practical problem in finite-horizon provided 
that the horizon is large enough. This truncated solution would greatly simplify the implementation while 
having a performance sufficiently close to finite-horizon optimality. 


A. Problem formulation 

For a Gaussian channel with feedback, the channel input may take the form 

u t = 7 t.u^ 1 + my 1 ' 1 + 6 


(7) 


for any 7 1 G M lxt , rj t G and zero-mean Gaussian random variable G M which is independent 

of n t_1 and 1 (cf. [11], [12]). Therefore, the channel inputs are allowed to depend on the channel 
outputs in a strictly causal manner. Our objective in this paper is to design encoder/decoder to achieve 
the asymptotic feedback capacity, given by 


Coo := Coo(F) : = 


sup 

{u t } stationary, 0 
s.t. P 00 :=Iim T _ 00 Eu T 'u T /(T+1)<V 


lim - 

T—>00 T + 1 


I{u 1 


( 8 ) 


“’More rigorously, the mappings from [u, N) to y are T-equivalent. For a discussion about systems representations and 
equivalence between different representations, see Appendix Q| 



where V > 0 is the power budget and I(u T —> y T ) is the directed information from u T to y 1 (cf. [11]). 
For more details about Coo, refer to [12], [16] and Section I V II AI in this paper. 

The problem of solving C^ may be equivalently formulated as minimizing the average channel input 
power while keeping the information rate bounded from below, namely for R > 0, 


Pooin) := inf 

{u t } stationary,(|7} 

s.t. limr-,00 I(u T —•y T )/(T+l)>TZ 


lim 


1 


T+ 1 


-E u T 'u T . 


(9) 


Therefore P 00 (R) is the inverse function of C' 00 (T’), i.e., C 00 (P 00 (R)) = R. 

Approach: Our approach to solve the steady-state communication problem is to investigate the finite- 
horizon problem first, and then let the horizon increase to infinity, which leads to a unified treatment of 
infinite-horizon and finite-horizon. Other approaches not pursued in this paper are also possible, such 
as applying the idea in [14] to the optimal signalling strategy in [12], though they generate results not 
as rich as the present approach does. 


B. The coding scheme 


The rest of this section presents the solution to the above problem. In this subsection, we introduce an 
encoder/decoder structure and explain how to choose the parameters to ensure the optimality, and then 
describe the encoding/decoding process, that is, how we assign the message to be transmitted, and how 
we recover the message. In the next subsection, we present the coding theorem which states that our 
encoding/decoding structure with the chosen parameters achieves Coo. The proof of the theorem will be 
developed in Sections El to ED 
The encoder/decoder structure 

In state-space, the encoder and decoder are described as 


and 


Encoder: 


Decoder: < 


Xt+l 

= Ax t 


n 

= Cx t 

( 10 ) 

u t 

= n-f t 


3t+i = 

Fs t + L 2 e t 


et = 

yt - Hit 


x t +i = 

Ax t + Lie t 

( 11 ) 

h = 

Cx t 


XQ,t = 

A-^xt+i, 



where s 0 = 0, x 0 = 0, A e R( n+1 ) x ( n+1 ), C <E M lx ( n+1 ), L x <E M n+1 , and L 2 <E M m . We call (n + 1) 
the encoder dimension, xt the encoder state, and xo,t the decoder estimate. See Fig. |2] for the block 
diagram. Observe that —ft is the feedback from the decoder based on the channel output y t ~ 1 , and thus 
ut depends on 1 but not y t . It further follows that — f 4 = Qfy 1 for some strictly lower triangular 
Toeplitz matrix Q*. Here A, C, ut, etc. depend on n, but we do not specify the dependence explicitly to 
simplify notations. 

Optimal choice of parameters 

Fix a desired rate R. Fet DI := 2 R and n := m — 1 (recalling that m is the channel dimension), and 
solve the optimization problem 


[a" pt ,E°P 4 ] := 


arg inf 

CLf 

s.t. £=A£A'—AEC'CEA'/(CEC'+l) 


BED' 


( 12 ) 







encoder 


channel T 


decoder 



Fig. 2. The encoder/decoder structure for T. 


where 


A : = 


A 

0 

GC 

F 


, C := [C H],B:=[C 0 ],A:= 


Onxl 

In ’ 

±DI 

a f _ 


,C:= 


1 0i> 


(13) 


Note that we need to solve o twice (one for +DI in A and one for —DI in A), and choose the 
optimal solution as the one with the smaller objective function value. Then we form the optimal A opt 
based on a upt , and let (to* + 1) be the number of unstable eigenvalues in A opt , where to* > 0. 

Now let n := to*, solve (fl2l) again, and obtain a new a'j 1 ' 1 and X opt . Then form A opt , let A* = A opt , 
£* = £ °P\ c* := [l,0i xn .], and form A*,C*, and B*. Let 


L* := [L«,Ll ']' := 


A*£*C*' 
<C*£*C*' _|_ 


As we will show, (A*,C*) is observable, and A* has exactly (to* + 1) unstable eigenvalues. 
We assign the encoder/decoder parameters to the scheme built in Fig. |2] by letting 


(14) 


n := ra*, A := A*, C := C*, L x := L\, L 2 := L* 2 . 


(15) 


We then drive the initial condition so of channel T to zero. Now we are ready to communicate at a rate 
7 Z using power Poo(Tt) = D*£*B*'. 4 

Encoding/Decoding process 

1) Transmission of analog source: The designed communication system can transmit either an analog 
source or a digital message. In the former case, we assume that the encoder wishes to convey a Gaussian 
random vector through the channel and the decoder wishes to learn the random vector, which is a rate- 
distortion problem (or successive refinement problem, see e.g. [13], [27], [28]). The coding process is 
as follows. Assume that the to-be-conveyed message W is distributed as Af(0, / n *+i) (noting that any 
non-degenerate (to* + l)-variate Gaussian vector W can be transformed into this form). Assume that the 
coding length is (T + 1). To encode, let xo := W. Then run the system till time epoch T, obtaining 
xq, t, t = 0,1, • • • , T. To decode, let Wt := £o,t for t = 0,1, • ■ • , T. 

The quantities of interest include the squared-error distortion, defined as 

MSE(Wj) := E(FF - W t )(W — W t )'. (16) 

It will become clear that MSEflfy) can be pre-computed before the transmission, and thus the coding 
length can be determined a priori to ensure a desired distortion level. 

4 We see from < 1 2> that for any channel T, a simple upper bound of the function Poo (77) is given by min{(2 2TC — 
l)(Z(2 n )) 2 , (2 2TC — 1)(Z( — 2 n )) 2 }, obtained by using one unstable eigenvalue in A. 



















































































2 ) Transmission of digital message: To transmit digital messages over the communication system, let 
us first fix e > 0 small enough and the coding length (T + 1) large enough. Let 


£* := [W,0]£*[/ n . +1 ,0]'. 


(17) 


Assume that the matrix {A*') T 1 E*(A*) T 1 


has an eigenvalue decomposition as 


(.A* / ) -T_ 1 £*(A *)- t - 1 = E t A t E' t , 


(18) 


where Et = [e^, ■ ■ ■, e (n ~ + l J ] is an orthonormal matrix and At is a positive diagonal matrix. Let OT,i 
be the square root of the (i,z)th element of At- Let B G W ~+ 1 be the unit hypercube spanned by 
columns of Et, that is, 


B 


E 

i=0 


a W e (i) 


cr 


»<=[--. i] 

L 2’ 2 



(19) 


Next we partition the ith side of B into (oT,i) {1 ^ segments. This induces a partition of B into Mt 
sub-hypercubes, where 


Mt 


IlKi )- 11 -' 1 

i=0 


[det ((A* / ) _T_ 1 E*(A*) _T_1 )] 


1 — e 
2 


( 20 ) 


We then map the sub-hypercube centers to a set of Mt equally likely messages. The above procedure 
is known to both the transmitter and receiver a priori. 

Suppose now we wish to transmit the message represented by the center W. To encode, let x 0 := W. 
Then run the system till time epoch T. To decode, we map x' 0.7 into the closest sub-hypercube center 
and obtain the decoded message Wt- We declare an error if Wt f W, and call a (an asymptotic) rate 

R := lim -log Mt (21) 

T^oo T + 1 

achievable if the probability of error PEp vanishes as T tends to infinity. We remark that this coding 
process is the one used in [14] for Gaussian channels with memory, which was an extension of the 
SK codes. In fact, the original SK coding scheme can be rewritten in a Kalman filter form, and hence 
it essentially implements the Kalman filtering algorithm. We also remark that, similar to the analog 
transmission case, the coding length (T + 1) can be pre-determined. 

As we have seen, the encoder/decoder design and the encoding/decoding process can be done rather 
easily. The computation complexity for encoding/decoding grows as 0(T + 1). Also interestingly, the 
encoder may be viewed as a control system, and the decoder may be viewed as an estimation system, 
as pointed out by Sanjoy Mitter and in [13], [29]. 


C. Coding theorem 

Theorem 1. Construct the encoder/decoder shown in Fig. fusing n*, A*, C*, L\, and Pf Then under 
the power constraint E u 2 < V, 

i) The coding scheme transmits an analog source W ~ Af(0,7 n *+i) from the encoder to the decoder 
at rate C' 00 ('P), with MSE distortion MSE(Wt ) achieving the optimal asymptotic rate-distortion tradeoff 
given by 

R= lim --log--—. ( 22 ) 

T—>oc 2 (T + 1 ) det MSE(Wt) 

ii) The coding scheme can transmit digital message from the encoder to the decoder at a rate arbitrarily 
close to CocfP), with PEt decays to zero doubly exponentially. 






The proof of the theorem will be developed in the subsequent four sections. In Section m we 
consider a general coding structure in finite-horizon which may be viewed as a generalization of our 
optimal coding structure. We show that this general structure essentially contains a Kalman filter. The 
presence of the Kalman filter links the feedback communication problem to an estimation problem and 
a control problem, and hence we rewrite the information rate in terms of estimation theory quantities 
and control theory quantities; see Section^ S cc t i o n s 1 1V I a n d |V1 a re focused on finite-horizon. In Section 
ED we extend the horizon to infinity and characterize the steady-state behavior. Then in Section Ivnl we 
show that our optimal encoder/decoder design is actually the solution to the steady-state communication 
problem. 

IV. Necessity of Kalman filter for optimal coding 

In this section, we consider a finite-horizon coding structure that includes our optimal design in Section 
UTTIas a special case. This general structure is useful since: 1) searching over all possible parameters in the 
general structure achieves C^, that is, there is no loss of generality or optimality to focus on this structure 
only; 2) we can show that to ensure power efficiency (to be explained), the general structure necessarily 
contains a Kalman filter. The general coding structure is in fact a variation of the CP structure (see 
Appendix III-Dl i. and hence our Kalman filter characterization leads to a refinement of the CP structure. 

A. A general coding structure 

Fig ED illustrates the general coding structure, including the encoder and the feedback generator, a 
portion of the decoder. Below, we fix the time horizon to be {0,and describe the coding 
structure. 


feedback 

encoder channel T generator 



Fig. 3. A general coding structure for channel T. 

Encoder: The encoder follows the dynamics (1101) . We assume that the encoder dimension (n + 1) 
satisfies 0 < n < T, W ~ AA(0, I n +i), A G R( n + 1 ) x ( n + 1 ) ) C G E lx i n+1 \ (A,C) is observable, and 
none of the eigenvalues of A are on the unit circle or at the locations of the eigenvalues of F. We then 
let 

T n (A,C) := T n := \C\A!C',---,A n 'C'f 

T(A,C) := T := [a, A'C', • ■ ■, A T 'C']' (23) 

I<P(A,C) := I\i T) := E r T r T '. 

Therefore, T n is the observability matrix for ( A , C) and is invertible, T has rank (n + 1), r T = TW, 
and K'P = TT'. 

Feedback generator: The feedback signal —ft is generated through the feedback generator Qt, i.e. 

- r T = G T y T . (24) 









































We assume that Qt G fljCr+i)x(T-i-i) is a strictly lower triangular matrix. Clearly, the optimal en¬ 
coder/decoder can be viewed as a special case of the general structure. Throughout the paper, the above 
assumptions on the encoder/decoder are always assumed unless otherwise specified. For future use 
purpose, we compute the channel output as 

y T = (I - zpg T )-\z 7 


+ N 1 ). 

Definition 1. Consider the general coding structure shown in Fig. 0 Define 

1 


Cr,n := C T , n {V) := 


and define its inverse function as Pr n (1Z). 


sup 


AgR(™ + l) *("• + !) ,C,Pt T + 1 

s.t. Eu t, u t /(T+1)<V 


nw-y) 


(25) 


(26) 


In other words, CV, n is the finite-horizon information capacity for a fixed transmitter dimension n. It 
holds that C njl = C n and hence lim n ^, 00 C nn = Coc (see Lemma [j] and Appendix IIIl Bt >. Moreover, 
as we will show, can be achieved using this structure. 

B. The presence of Kalman filter 

We first compute the mutual information in the general coding structure. 

Proposition 1. Consider the general coding structure in Fig. 0 Fix any 0 < n < T, and fix any A, C, 
and Qt■ Then it holds that 

= I(r T -,y T ) 

= I(u T - y T ) 

= \ log det Ky , T ) (27) 

= \ log det {I+ Zf l PpZp) 

= | log det (I + ZpTT'Zp), 


I(W-y 1 


(28) 


which is independent of Qt- 
Proof: 

I(W ; y T ) = h(y T )-h(y T \W) 

= h(y T ) - h((I - Zf 1 Q T )- 1 (Zf 1 r T + N T )\W ) 

\ logdet(27reiv^) — h(N T ) 

(6) T , t T\ 

= Iyu, 1 -> y 1 ) 

= ^ log det PP 

= \ log det {I+ Zp I<P Zf 1 '), 

where (a) is due to r 1 = TW, det(A77) = det A det B, and det(7 — ZpQx) -1 = 1; and (b) follows 
from [14], □ 

Proposition |U implies that I(W: y r ) is independent of the feedback generator Qr, and dependent 
only on /v,; or equivalently on (A , C). Thus, fixed (A , C) implies fixed information rate, and hence 
the optimal feedback generator has to be chosen to minimize the average channel input power, which 
turns out to contain a Kalman filter. Note that the counterpart of this proposition in infinite-horizon was 
proven in [14]. Now we can define, for a fixed (A, C ), the information rate across the channel to be 

m-y T ) 


Rt(A,C ) := 


T + 1 


( 29 ) 


The optimal feedback generator for a given ( A , C) is found in the next proposition. 
Proposition 2. Consider the general coding structure in Fig. 0 Fix any 0 < n < T. Then 






i) 


Pr,n(K) 


1 


inf 

A,C,Gt:=G^(A,C) 
s.t. r t (a,c)>h 


T + 1 


-E u T 'u T 


(30) 


where GfiA. C) is the optimal feedback generator for a given (A. C), defined as 

St(A,C):= arg inf —-E u T 'u T . (31) 

(A.C) fixed, Q t 1 + 1 

ii) The optimal feedback generator Gf(A. C) is given by 

Qt(A, C) = -Gt(A, C)(I - Zf 1 Gf(A, C))-\ (32) 

where Gf(A, C) is the strictly causal MMSE estimator (Kalman filter) of r T given the noisy observation 
y T := Zf l r T + N T , i.e., 

Gf(A,C) := arg ^ inf —^-E(r T - G T y T ){r T - GtV T )', (33) 

5 t 6R( t + 1 ) x ( t + 1 ) 1 + 1 

where Gt is strictly lower triangular. See Fig. ^\(a) for the associated estimation problem, (b) for the 
Kalman filter Gf{A , C), and (c) for the optimal feedback generator Gf(A, C ). 


Remark 1. Proposition |2] reveals that, the minimization of channel input power in a feedback commu¬ 
nication problem is equivalent to the minimization of MSE in an estimation problem. This equivalence 
yields a complete characterization (in terms of the Kalman filter) of optimal feedback generator Gf {A, C ) 
for a given (A, C). Since our general coding structure is a variation of the CP structure, this proposition 
leads to the Kalman filter based characterization of the CP structure and hence is an improvement of 
the Cover-Pombra formulation; see Appendix III-DI 

Remark 2. Proposition |2] i) implies that we may reformulate the problem of Ct,u (or Pt,u) as a two- 
step problem: In step 1, we fix (A, C), i.e. fixing the rate, and minimize the input power by searching 
over G', and in step 2, we search over all possible (A, C) subject to the rate constraint. The role of the 
feedback generator G for any fixed (A, C) is to minimize the input power. Then ii) solves the optimal 
feedback generator Gf{A,C) by considering the equivalent optimal estimation problem in Fig. 0 (a) 
whose solution is the Kalman filter. Notice that the Kalman filter can also give us the optimal estimate 
of the message W. Hence, the Kalman filter leads to both power efficiency and the best estimate of the 
message. The power efficiency is ensured by the one-step prediction operation of the Kalman filtering, 
and the optimal recovery of message is ensured by the smoothing operation of the Kalman filtering; 
therefore, we obtain the optimality of Kalman filtering in the information transmission sense. We finally 
note that the necessity of the Kalman filter is not surprising given the previous indications in [2], [5], 
[11], [13], [24], etc. 

Proof: i) Notice that for any fixed (A, C), IljiA. C ) is fixed. Then from the definition of P'r. n (TT), 
we have 

- fsk 

s.t. Rt(A,C)>11 

= inf 

A,C 

s.t. Rt(A,C)>TZ 

Then i) follows from the definition of Gf{A, C). 


T+l 


-E u 1 'u 


T , t 


inf 


1 


T , t 


(A,C ) fixed,p T T+l 


E u^u 


(34) 
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(c) 

Fig. 4. (a) An estimation problem over channel T. (b) The Kalman filter Q^(A,C). (c) The Kalman filter based feedback 

generator G^(A,C). Here (A, Li,t, — C, 0) with xt denotes a state-space representation with xt being its state at time t, and 
oio being 0; see <411 and <44> for Li,t and L 2 ,t- 


ii) Note that for the general coding structure, it holds that 


u T = r T + (-f T ) = r T + QtV T ■ 

(35) 

Then, letting 


Qt '■= —Qt{I — Z^Gt) 1 

(36) 

and y T := Z^r T + N T , we have QtU T = — QtV T ■ Therefore, 



GUAC) = axginf-l-E (r T + Q T y T ){r T + Q T y T )' 

y T 1 ~r 1 

= arginf—^—E {r T - Q T y T ){r T - Q T y T )'■ 

Qt 1 + 1 





























































































The last equality implies that the optimal solution Qf is the strictly causal MMSE estimator (with one- 
step prediction) of r T given yr\ notice that Q-p is strictly lower triangular. It is well known that such an 
estimator can be implemented recursively in state-space as a Kalman filter (cf. [30], [31]). Finally, from 
the relation between Qr and Qt, we obtain & The state-space representation of Q^(A, C) needs only 
a straightforward computation, as shown in Appendix U □ 

We remark that it is possible to derive a dynamic programming based solution ( [11]) to compute CV, n > 
and if we further employ the Markov property in [12] and the above Kalman filter based characterization, 
we would reach a solution with complexity 0{T) for computing Ct, u and Ct- However, we do not 
pursue along this line in this paper since it is beyond the main scope of this paper. 

V. Feedback rate, CRB, and Bode integral 

We have shown that in the general coding structure, to ensure power efficiency for a fixed (A, C ), 
we need to design a Kalman-filter based feedback generator. The Kalman filter immediately links the 
feedback communication problem to estimation and control problems. In this section, we present a 
unified representation for the general coding structure (with Q being chosen as Q*[A, C)), its estimation 
theory counterpart, and its control theory counterpart. Then we will establish connections among the 
information theory quantities, estimation theory quantities, and control theory quantities. 


A. Unified representation of feedback coding system, Kalman filter, and minimum-energy control 


In this subsection, we will present the dynamics for the estimation problem and the general coding 
structure, then show that they are governed by one set of equations, which may also be viewed as a 
control system. 

The estimation system 

The estimation system in Fig. 0] consists of three parts: the unknown source r T to be estimated or 
tracked, the channel T (without output feedback), and the estimator which we choose as the Kalman filter 
Q*\ we assume that (A, C) is fixed and known to the estimator. The system is described in state-space 
aS 

nr, , — A nr, a 

> unknown source 


estimation system: < 


Xt+l 

= Ax t 

n 

= Cx t 

St+1 

= Fs t + Gr t 

yt 

= Hs t + r t + N t 

Xt+l 

= Ax t + Lijet 

h 

= Cx t 

St+l 

= Fs t + Gf t + L 2 , t e t 

e t 

= y t - Hs t - h 


channel F 


> Kalman filter Q*{A. C ) 


(38) 


with xo = W, sq = sq = 0, and xo = 0. Here £ R" +1 and L 2) t £ M m are the time-varying Kalman 
filter gains specified in <m. 

The general coding structure with the optimal feedback generator 




The optimal feedback generator for a given {A, C) is solved in (132b . see Fig. 0] (c) for its structure. 
We can then obtain the minimal state-space representation of Gfi(A, C ), and describe the general coding 
structure with Gfi{A,C) as 


general coding structure: < 


Xt+1 

= Ax t 

n 

= Cx t 

U t 

= n-h 

St+1 

= F st + Gut 

yt 

= Hs t + u t + N t 

St+l 

= Fs t + L 2)t e t 

et 

= y t - Hs t 

x t +i 

= Ax t + L+ t e t 

-h 

- -Cx t 

and xq 

= 0. See Appendix 


encoder 


channel F 


optimal feedback generator G*(A, C) 


(39) 

or the derivation of the minimal state-space 
representation of Gfi(A, C ). It can be easily shown that ry, f t , et, xt, and x; t in d38l and d39t are equal, 
respectively, and it holds that 

s t - s t = s t - Sf (40) 


The unified representation 

Define 


x t 

St 

X* 


X 0 


A 


u 


xt - 

Xt 


st - 

St 

= S t 

Xt 



St 



W 



_ 0 



A 

0 

GC 

F 


\C H] 
[C 0] 
Aqt 

L 2 ,t. 


(41) 


Note that X t is the estimation error for \x' t ,s' t \'. Substituting (14 1 1) to (13 8b and (139b . we obtain that both 
systems become 


control system: < 


Xt +1 

et 


(A - L t C)X t - L t N t = A Xt - L t e t 
C Xt + N t 


u t = BXt; 


(42) 


see Fig. |5] for its block diagram. It is a control system where we want to minimize the power of u 
by appropriately choosing L t . This is a minimum energy control problem, which is useful for us to 
characterize the steady-state solution and it is equivalent to the Kalman filtering problem (see [32]). 

The signal et in is called the Kalman filter innovation or innovation 5 , which plays a significant 

(T) 

role in Kalman filtering. One fact is that {et} is a white process, that is, its covariance matrix K\ ; is a 
diagonal matrix. Another fact is that e T and y 1 determine each other causally, and we can easily verify 
that h(e r ) = h{y r ) and det A'y T) = det A'; / : . We remark that (142b is the innovations representation of 
the Kalman filter (cf. [31]). 


5 The innovation defined here is different from the innovation defined in [6] or [12]. 













Fig. 5. The block diagram for the minimum-energy control system. Here the block [A, — Li jt , C, 0) with x t denotes the 
state-space representation with x t and W being its state at time t and at time 0. 


For each t, the optimal L t is determined as 


L t := 


L i,t 

I J 2,t 


A StC 

K e , t 


(43) 


where Y, t := EX/Xj, K f: i := Efe/) 2 = CX/C' + 1, and the error covariance matrix X/ satisfies the 


Riccati recursion 


X t+ i 


= AX* A' - 


AXjC'CXtA' 
CXiC' + 1 


(44) 


with initial condition 



0 

0 


(45) 


This completes the description of the optimal feedback generator for a given ( A , C ). 

The meaning of a unified expression for three different systems (El). El- and El is that the first two 
are actually two different non-minimal realizations of the third. The input-output mappings from N 1 to 
e T in the three systems are T-equivalent (see Appendix II BI ). Thus we say that the three problems, the 
optimal estimation problem, the optimal feedback generator problem, and the minimum-energy control 
problem, are equivalent in the sense that, if any one of the problems is solved, then the other two are 
solved. Since the estimation problem and the control problem are well studied, the equivalence facilitates 
our study of the communication problem. Particularly, the formulation El yields alternative expressions 
for the mutual information and average channel input power in the feedback communication problem, 
as we see in the next subsection. 

We further illustrate the relation of the estimation system and the communication system in Fig. 
H (b) is obtained from (a) by subtracting ft from the channel input and adding Zf 1 fy back to the 
channel output, which does not affect the input, state, and output of Q* T . It is clearly seen from the 
block diagram manipulations that the minimization of channel input power in feedback communication 
problem becomes the minimization of MSE in the estimation problem. 


B. Mutual information in terms of Fisher information and CRB 
Proposition 3. For any fixed 0 < n < T and (A, C), it holds that 































(b) 


Fig. 6. Relation between the estimation problem (a) and the communication problem (b). 


i) 


ii) 


HW-y T ) 


- log det KP = - log R e,t 

z t =0 

1 T 

log(CE t C' + l) 

z t= o 

- log det MMSE^ t 

-log det l\v,T 

{ 

-log det CRB w ‘ t ; 


PtAAC) = -l-^B£ t B' 

+ 1 t=o 

= trace ( CMMSE rT ) 

1 T 

= -J2 CAtMMSE W ,t At, C, 

+ 1 t =o 


(46) 


(47) 


where MMSE\\\j' is the minimum MSE of W, CMMSE r j is the causal minimum MSE of r T , Iw,T ls the 
Bayesian Fisher information matrix of W for the estimation system dll, and CRB ip- 7 ’ is the Bayesian 


CRB ofW [33]. 


Remark 3. This proposition connects the mutual information to the innovations process and to the 
Fisher information, (minimum) MSE, and CRB of the associated estimation problem. As a consequence, 
the finite-horizon feedback capacity Ct,u is then linked to the smallest possible Bayesian CRB, i.e. 
the smallest possible estimation error covariance, and thus the fundamental limitation in information 
theory is linked to the fundamental limitation in estimation theory. It is also interesting to notice that 
the Fisher information, an estimation quantity, indeed has an information theoretic interpretation as its 
name suggests. Besides, the link between the mutual information and the MMSE provides a supplement 
to the fundamental relation discovered in [25]; the connections between our result and that in [25] is 
under current investigation. 











































Proof: i) First we simply notice that h(y T ) = h(e T ), and K e j = CSjC' + 1. Next, to find MMSE 
of W, note that in Fig. El (a) 

y T = Zf x YW + N r (48) 

and that W ~ jV(0, I), N T ~ Af(0, I). Thus, by [30] we have 

MMSEy^ = (I + V Zf 1 'Zf x T)- 1 , (49) 

yielding 

det MMSEvi/,t = det(J + Z^TT'Zf 1 ')- 1 

= det (I + Zf 1 K^ r ’ n) Zf 1 ')- 1 . ' Ml) 

Besides, from Section 2.4 in [33] we can directly compute the FIM of W to be (I + VZf 1 'Z^Y). 
Then i) follows from Proposition |T] and m. 

ii) Since ut = DX t = Cxt = rt — ft an d E xtx[ = A ^ MMSEyI/;tA ^, , we have E (uf) 2 = DE t B' = 
C~Ex t x' t C' = E (rt — ft) 2 , and then ii) follows. □ 

C. Necessary condition for optimality 

Before we turn to the infinite-horizon analysis, we show in this subsection that our general coding 
structure together with the optimal feedback generator satisfies a “necessary condition for optimality” 
discussed in [15]. The condition says that, the channel input ut needs to be orthogonal to the past channel 
outputs y t ~ 1 . This is intuitive since to ensure fastest transmission, the transmitter should not transmit 
any information that the receiver has obtained, thus the transmitter wants to remove any correlation of 
y 1 - 1 in ut (to this aim, the transmitter has to access the channel outputs through feedback). 

Proposition 4. In system m, for any 0 < r < t, it holds that E uty T = 0 . 

Proof: See Appendix 111-1:1 □ 


VI. Asymptotic behavior of the system 

By far we have completed our analysis in finite-horizon. We have shown that the optimal design of 
encoder and decoder must contain a Kalman filter, and connected the feedback communication problem 
to an estimation problem and a control problem. Below, we consider the steady-state communication 
problem, by studying the limiting behavior (T going to infinity) of the finite-horizon solution while 
fixing the encoder dimension to be (n + 1). 


A. Convergence to steady-state 

The time-varying Kalman filter in S3 converges to a steady-state, namely S3 is stabilized in closed- 
loop, ut, et, and y t will converge to steady-state distributions, and V/, L t , Q*{A,C), Gj\ and K et will 
converge to their steady-state values. That is, asymptotically <m> becomes an LTI system 


where 


steady-state: < 


X t+ i 

u t 


(A - LC)X t - LN t = AX* - Le t 

C X t + N t 

BX t , 


L := 


AXC' 


K e = CSC' + 1, and S is the unique stabilizing solution to the Riccati equation 


S = ASA' - 


AXC'CXA' 
CSC' + l ' 


(51) 


(52) 


(53) 





This LTI system is easy to analyze (e.g., it allows transfer function based study) and to implement. 
For instance, the minimum-energy control (cf. [32]) of an LTI system claims that the transfer function 
from N to e is an all-pass function in the form of 


T r 


Ne 


M-n 


i =0 


2 - a* 


z — a ■ 1 


(54) 


where ao,-"!°fc are the unstable eigenvalues of A or A (noting that F is stable). Note that this is 
consistent with the whiteness of innovations process {et}. 

The existence of steady-state of the Kalman filter is proven in the following proposition. Notice that 
m is a singular Kalman filter since it has no process noise; the convergence of such a problem was 
established in [34], 


Proposition 5. Consider the Riccati recursion \44i and the system m. 

i) Starting from the initial condition given in (EH». the Riccati recursion (E3 generates a sequence 
{ N/ } that converges to with rank (n + 1), the unique stabilizing solution to the Riccati equation 

ii) The time-varying system converges to the unique steady-state as given in (ED- 

Proof: See Appendix 1111 A I □ 


B. Steady-state quantities 

Now fix (A,C) and let the horizon T in the general coding structure go to infinity. Let 74(e) be 
the entropy rate of {et}, DI(A) := \ a >\ be the degree of instability of A, and S(e/ 27r0 ) be the 

spectrum of the sensitivity function of system m (cf. [14]). Then the limiting result of Proposition [5] 
is summarized in the next proposition. 


Proposition 6. Consider the general coding structure in Fig. 0 For any n > 0 and (A, C), 
i) The asymptotic information rate is given by 

1 


Roc,n{A, C) := lim 


T-» oc T - J- 1 
2 

log DI(A) 


I(W ; y 1 ) 


R{e) - 2 lo S 2yre 


l: 

i 


log S{e j2 * e )d0 
log(CSC' + 1) 


= lim 


log det Tw,t 


T^oc 2(T + 1) 

log det MSE wt 

— lim -—-r— ! — 

T—»c» 2 r + 1 


— lim 

T—» OO 


ii) The average channel input power is given by 

Poo^A, C) := lim 


T—>oo T + 1 

= BSD'. 


log det CRBw^t 
2(T + 1) 

E u r pU r p 


1 


(55) 


(56) 










Remark 4. Proposition |6] links the asymptotic information rate to the entropy rate of the innovations 
process, to the degree of instability and Bode sensitivity integral ( [14]), to the asymptotic increasing 
rate of the Fisher information, and to the asymptotic decay rate of MSE and of CRB. Recall that the 
Bode sensitivity integral is the fundamental limitation of the disturbance rejection (control) problem, 
and the asymptotic decay rate of CRB is the fundamental limitation of the recursive estimation problem. 
Hence, the fundamental limitations in feedback communication, control, and estimation coincide. 

Remark 5. Proposition [77] implies that the presence of stable eigenvalues in A does not affect the rate 
(see also [14]). Stable eigenvalues do not affect Poo, n {A,C), either, since the initial condition response 
associated with the stable eigenvalues can be tracked with zero power (i.e. zero MSE). So, we can 
achieve Cx,^ by a sequence of purely unstable (A. C ), and hence the communication problem is related 
to the tracking of purely unstable source over a communication channel ( [13], [14]). 

Proof: Proposition |6] leads to that, the limits of the results in Proposition 0 are well defined. Then 

1 T 

*=o,„t4,c) = 

= lim Tog K ert ^ 

T —xx) z 

= 7f(e) — |log2vre, 

where the second equality is due to the Cesaro mean (i.e., if «/,. converges to a, then the average of the 
first k terms converges to o as ic goes to infinity), and the last equality follows from the definition of 
entropy rate of a Gaussian process (cf. [35]). 

Now by (l>7t . {e t } has a flat power spectrum with magnitude DI(A) 2 . Then R OCjn (A, C) = log DI(A). 
The Bode integral of sensitivity follows from [14], The other equalities are the direct applications of 
the Cesaro mean to the results in Proposition |3| □ 

VII. ACHIEVABILITY OF Coo 

In this section, we will prove that Coo,m-i = Coo, leading to the optimality of our encoder/decoder 
design in Section imi in the mutual information sense, and then show that our design achieves Coo in 
the operational sense. 

A. The optimal Gauss-Markov signalling strategy and a simplification 

[12] proved that for each input in the form of 0, there exists a Gauss-Markov (GM) input that 
yields the same directed information and same input power. The GM input takes the form 

tH — d t Ss,t T Z■ (58) 

where dt G M m is a time-varying gain; {£ t } is a zero-mean white Gaussian process and £ t is independent 
on N t ~ 1 , if- 1 , and if - 1 ; and s s>t is generated by a Kalman filter (noting that this Kalman filter is 
different from the Kalman filter obtained in this paper) 

S s ,t := St — 

< s s ,t-\- 1 = FSs,t H - LsjCt (59) 

e t = y t - Hs s t , 


where s s ,o = 0, 


QtXs,t(H + d' t f + KfG 





Qt := F + Gd' t , and H s>t := E s s ys' s t is the estimation error covariance of s t , satisfying the Riccati 
recursion 


^s,t+i — Qt^s,tQ't + KpGG' — 


(Qt^sAH + d' t y + hf ] G)(Q t j: s AH + d' t y + k™g)’ 




1 + Kf > + [H + d'^sAH + d' t y 
hat is 

search over all possible d and Kg solves Coo, that is, 


(61) 


If one lets d t = d and K'A = Kg for all t, that is, the input {u t } is a stationary process, then the 


Goo{V) — max — log(l + Kg + (H + d')T, s (H + d')') 
dgR m ,,fsTeR 2 


subject to Riccati equation constraint and power constraint 

= Q^sQ’ + Kt 

V = d'T, s d + Kg. 


v 1 _ nr r\t i t/’ (QH s {H-\-d')'-\-KsG)(QT, s (H-\-d')'-\-KsG)' 

~ + I\£LrLr l+K e +{H+d')X a (H+d')' - 


(62) 


(63) 


We remark that [12] was focused more on the structure of the optimal input distribution and capacity 
computation, instead of designing a coding scheme; how to encode/decode a message (rather than using 
a random coding argument) is not clear from [ 12 ]. 

Now we prove that Kg = 0, namely {£/,} vanishes in steady-state. 6 This leads to a further simplifi¬ 
cation of the results in [ 12 ]. 


Proposition 7. For the GM input (El to achieve Coo, it must hold that Kg = 0. 

Proof: See Appendix IIVI □ 

The vanishing of {£ t } in steady-state helps us to show that, our general coding structure shown in 
Fig.0 can achieve Coo, and the encoder dimension needs not be higher than the channel dimension, 
namely to achieve Coo we need A to have at most m unstable eigenvalues, as we will see in the next 
subsection. 


B. Generality of the general coding structure; finite dimensionality of the optimal solution 

In this subsection, we show that the general coding structure is sufficient to achieve mutual information 
Coo ■ In other words, if we search over all admissible parameters A, C, Qt in the general coding structure, 
allowing T to increase to infinity and n to increase to (m — 1), then we can obtain Coo- Thus, there is no 
loss of generality and optimality to consider only the general coding structure with encoder dimension 
no greater than m. 

Definition 2. Consider the general coding structure in Fig. 0 Let 

Coo,n ■= Coo,n{V) := sup lim 1 I(W;y T ) (64) 

J 4£R("+ 1 )x(«+i> 1 C,g 0o T-XX) 1 + 1 

subject to 

Poo,n ■■= lim 1 Eu t, u t < V. (65) 

T—xx) 1 1 

In other words, Coo, n is the infinite-horizon information capacity for a fixed transmitter dimension. 
Note that C 0 0 , n exists and is finite. To see this, note Proposition H Coo.n < Coo < oo, and the fact that 

C 0 o,A'P)= sup Roo, n (A,C). (66) 

AgRf^+O x ("+D ,C,Q* fA.CT.l65l 

The function Coo,AP ) a l so induce Poo,n(fl), the “capacity” in terms of minimum input power subject 
to an information rate constraint. 

6 AT = 0 was also conjectured and numerically verified by Shaohua Yang (personal communication). 






Proposition 8. Consider the general coding structure in Fig. 0 

i) Coo,n is increasing in n; 

ii) For channel IF with order m > 1 , Cd n = for n > m — 1 . 

Proof: See Appendix IV-Al □ 

This proposition suggests that, to achieve C^, we may first fix the transmitter dimension as (n + 1) 
and let the dynamical system run to time infinity, obtaining C'oc, n , and then increase n to [m — 1). The 
finite dimensionality of the optimal solution is important since it guarantees that we can achieve Cd 
without solving an infinite-dimensional optimization problem. 

C. Achieving 

In this subsection, we prove that our coding scheme achieves in the information sense as well as 
in the operational sense. 

Proposition 9. For the coding scheme described in Theorem 0 Roo,n* (A*, C*) = C' 00 ('P) and 
Poo.n* (A*,C*)=V. 

Proof: See Appendix IY-BI □ 

Proposition 10. The system constructed in Theorem 0 transmits the analog source W ~ A7(0, 1) at a 
rate C^dV), with MSE distortion I)iCF (V )), where I)(-) is the distortion-rate function. 

Proof: See Appendix IV-CI □ 

Proposition 11. The system constructed in Theore m\J]t ransmits a digital message W from the transmitter 

to the receiver at a rate arbitrarily close to CoofP) with PEt decays doubly exponentially. 

Proof: See Appendix IV-DI □ 

Note that, the coding length needed for a pre-specified performance level can be pre-determined since 

X* rp can be solved off-line. Besides, because the probability of error decays doubly exponentially, it 
leads to much shorter coding length than forward transmission. 


VIII. Numerical example 


Here we repeat the numerical example studied in [12]. Consider a third-order channel (i.e. m = 3) 
with 

1 + 0.5Z” 1 -0Az~ 2 

(67) 


Z _1 := 


1 + 0.6z- 2 - OAz 


y~ 3 


In state-space, Z 1 is described as ( F , G , H, 1) where 


'0 

-0.6 

0.4' 


'1' 

1 

0 

0 

G = 

0 

.0 

1 

0 . 


.0. 

0.5 

-1 

0.4] . 




( 68 ) 


Assume the desired communication rate 77. is 1 bit per channel use. We first solve o with n = m — 1 = 
2, and find out n* = 1. That is, C oo is attained when A has two unstable eigenvalues. Then we solve 
o again with n* = 1, and obtain 


-2 -0.887 

—0.506 -0.225 0.573 0.092 -0.327]'. 


L 


(69) 














Fig. 7. The asymptotic feedback capacity Coo and feedforward capacity for channel T with Z 1 = (1 + 0.52 1 — 
0.42" 2 )/(l + O. 62" 2 - 0.42" 3 ). 


This yields the optimal power = 0.743 (or -1.290 dB). Similar computation generates Figure |7J the 
curve of C^c against SNR or equivalently V. This curve is identical to that in [12]. 

We then use the obtained A*, C*, and L* to construct the optimal communication scheme. However, 
we observe that the optimal communication scheme shown in Fig. 0 generates unbounded signals {77 } 
and {ft} due to the instability of A. This is not desirable for the simulation purpose, though the scheme 
in the form of Fig. 0 is convenient for the analysis purpose. Here, we propose a modification of the 
scheme, see Fig. 0 It is easily verify that the system in Fig. [s] is T-equivalent to that in Fig. 0 As 
we indicate in Fig. 0 the loop including the encoder, the channel, and the feedback link is indeed the 
control setup, which is stabilized and hence any signal inside is bounded. 6 Note that the encoder now 
involves X-\\ we set x,-\ := A~ 1 W, leading to xq = W, the desired value for xq . 



control setup 


Fig. 8 . The modified feedback communication scheme. 

We report the simulation results using the modified communication scheme with the optimal parameters 

6 We remark that, in the case of an AWGN channel, the modification coincides with the one studied by Gallager (p. 480, 
[36]) with minor differences. This modification differs from the more popular feedback communication designs in [1], [2], [14]; 
notice that, [ 1 ] involves exponentially growing bandwidth, [ 2 ] involves an exponentially growing parameter a k where a > 1 
and k denotes the time index, and [14] generates a feedback signal with exponentially growing power. Thus we consider our 
modification more feasible for simulation purpose. However, this modification is not yet “practical”, mainly because of the 
strong assumption on the noiseless feedback. A more practical design is under current investigation. 






























































































given in & Fig.|9l( a ) shows the convergence of xqj to xo, in which xq := [—0.2, —0.7]'. Fig.[9](a) also 
shows the time average of the channel input power, which converges to the optimal power = 0.743. 
To compute the probability of error, we let e = 0.2, i.e., the signalling rate is equal to O.SCoo. We 
demonstrate that this signalling rate is achieved by showing that the simulated probability of error 
decays to zero, see Fig. El (b). Fig. m (b) also plots the theoretic probability of error computed from 
Jim which is almost identical to the simulated curve. In addition, we see that the probability of error 
decays rather fast within 28 channel uses. The fast decay implies that the proposed scheme allows shorter 
coding length and shorter coding delay; here coding delay measures the time steps that one has to wait 
for the message to be decoded at the receiver with small enough error probability. 
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Fig. 9. (a) Convergence of xo,t to xo, and convergence of the average channel input power, (b) Simulated probability of error 

and theoretic probability of error. 


IX. Conclusions and future work 

We presented a coding scheme to achieve the asymptotic capacity C Q0 for a Gaussian channel with 
feedback. The scheme is essentially the Kalman filter algorithm, and its construction involves only 
a finite dimensional optimization problem. We established connections of feedback communication to 
estimation and control. We have seen that concepts in estimation theory and control theory, such as 
MMSE, CRB, minimum-energy control, etc., are useful in studying a feedback communication system. 
We also verified the results by simulations. 

Our ongoing research includes convexifying the optimization problem to reduce the computation 
complexity, and finding a more feasible scheme to fight against feedback noise while keeping the feedback 
signal bounded. In future, we will further explore the connections among information, estimation, and 
control in more general setups (such as MIMO channels with feedback). 

Appendix I 

Systems representations and equivalence 

The concept of system representations and the equivalence between different representations are 
extensively used in this paper. In this subsection, we briefly introduce system representations and the 
equivalence. For more thorough treatment, see e.g. [37]—[39]. 



















A. Systems representations 

Any discrete-time linear system can be represented as a linear mapping (or a linear operator) from 
its input space to output space; for example, we can describe a single-input single-output (SISO) linear 
system as 

y l = M t u 1 (70) 

for any t, where Ai t E l( t+1 ) x ( , + 1 ) is the matrix representation of the linear operator, v 1 E 1R7+ 1 is 
the stacked input vector consisting of inputs from time 0 to time t, and y t E R /+ 1 is the stacked output 
vector consisting of outputs from time 0 to time t. For a (strictly) causal SISO LTI system, A4 t is 
a (strictly) lower-triangular Toeplitz matrix formed by the coefficients of the impulse response. Such 
a system may also be described as the (reduced) transfer function, whose inverse ^-transform is the 
impulse response; by a (reduced) transfer function we mean that its zeros are not at the same location 
of any pole. 

A causal SISO LTI system can be realized in state-space as 


x t+ i = Ax t + Bu t 
y t = Cx t + Du t , 


(71) 


where xt E K / is the state, ut E R is the input, and y t E R is the output. We call l the dimension or the 
order of the realization. The state-space representation (17 1 1 ) may be denoted as {A. B, C. D). Note that 
in the study of input-output relations, it is sometimes convenient to assume that the system is relaxed or 
at initial rest (i.e. zero input leads to zero output), whereas in the study of state-space, we generally allow 
X(j / 0, which is not at initial rest. For multi-input multi-output (MIMO) systems, linear time-varying 
systems, etc., see [38], [39]. 

The state-space representation of an causal FDLTI system A4(z) is not unique. We call a realization 
(. A,B,C,D) minimal if (A, B) is controllable and (A,C) is observable. All minimal realizations of 
M{z) have the same dimension, which is the minimum dimension of all possible realizations. All other 
realizations are called non-minimal. 

An example 

We demonstrate here how we can derive a minimal realization of a system. Consider Qj,(A,C) in 
03 in Section IIVI which is given by 


g* T (A,c) = -g* T (i-z- 1 g* T )-\ 


(72) 


where the state-space representations for Q^(A,C) and Z T 1 are illustrated in Fig. 01 (b) and Fig. |T| 
(c). Since 03 suggests a feedback connection of Q* and Z 1 as shown in Fig. m we can write the 
state-space for Q* as 

& t +1 = Ax t + Li tt e t 

ft = Cx t 

st.+i = Fst + Gft + L 2 ^et 
e t = y t - Hs t - f t 

®a,t+1 — Bs at t T Gft 
yt = yt + Hs a>t + f t . 

Then let := s t — s a> t, and we have 


(73) 


Xt+1 

= Ax t + Li >t e t 

ft 

= Cx t 

Sf+1 

= Fs t + L 2}t e t 

et 

= y t - Hs t . 


(74) 




It is straightforward to check that this dynamics is controllable and observable, and therefore it is a 
minimum realization of Q*. 


g* 



Fig. 10. Q* is a feedback connection of Q* and Z 1 


B. Equivalence between representations 

Definition 3. i) Two FDLTI systems represented in state-space are said to be equivalent if they admit 
a common transfer function (or a common transfer function matrix) and they are both stabilizable and 
detectable. 

ii) Fix 0 < T < oo. Two linear mappings Mi t T '■ i = 1,2, both at initial rest, 

are said to be T -equivalent if for any u T E M.' lir+l> , it holds that 

Ali ,t(u T ) = M-2 (75) 

We note that i) is defined for FDLTI systems, whereas ii) is for general linear systems, i) implies 
that, the realizations of a transfer function are not necessarily equivalent. However, if we focus on all 
realizations that do not “hide” any unstable modes, namely all the unstable modes are either controllable 
from the input or observable from the output, they are equivalent; the converse is also true, ii) concerns 
about the finite-horizon input-output relations only. Since the states are not specified in ii), it is not 
readily extended to infinite horizon: Any unstable modes “hidden” from the input and output will grow 
unboundedly regardless of input and output, which is unwanted. 

Examples 

As we mentioned in Section Hl-BI for any u T and N T , Fig. |T](a) and (b) generate the same channel 
output y T . That is, the mappings from (n T , N r ) to y 1 for the two channels are identical, and both are 
given by 

y T = Z T (Zf l u T + N t ). (76) 

Thus, we say the two channels are T-equivalent. 

The feedback communication system <E3, estimation system (El, and control system (El are T- 
equivalent, since for any N T , they generate the same innovations e T . 

Appendix II 

Finite-horizon: The feedback capacity and the CP structure 
A. Feedback capacity Ct 

The following definition of feedback capacity is based on [11], 

Definition 4. The “ operational ” or “information” finite-horizon feedback capacity Ct, subject to the 
average channel input power constraint 

Pt := lim ———E u T 'u T < V , 

T^ooT+1 - ’ 


(77) 
















IS 


Ct(V ) := Ct ■= sup 


1 


T + l 


I(u T -+ y T ), 


(78) 


where I(u T —> y T ) is the directed information from u T to y T , and the supremum is over all possible 
feedback-dependent input distributions satisfying dza and in the form 


u t = iP 1 + ryy 1 1 + it 


(79) 


for any 7 1 E K 1 , rjt E M lx , and zero-mean Gaussian random variable (( 6l independent of u 
and y 


t—i 


t -1 


B. CP structure for colored Gaussian noise channel 

We briefly review the CP coding structure for the colored Gaussian noise channel specified in Section 

El see [6], [35] for more details of the CP structure. Let the colored Gaussian noise Z T have covariance 

(T) 

matrix K z , and 


u T := B t Z 7 


( 80 ) 


where Bj is a (T+l) x (T + 1) strictly lower triangular matrix, v 1 is Gaussian with covariance kP > 0 
and is independent of Z T . 7 This generates channel output 


y T = {I + B t )Z t + v 7 


( 81 ) 


Then the highest rate that the CP structure can achieve in the sense of operational and information is 


Ct,cp{P) = sup 


1 


T+l 

1 


= sup 


= sup 


I{v T -y T ) 

det KP 
log 


2 ( r + x ) det K 


(T) 


(82) 


1 


det((J + B T )K { z \l + BtY + kV>) 


2(T + 1) 


log 


(T), 


det K 


(?) 


(T) 

where the supremum is taken over all admissible T+ ' and Bt satisfying the power constraint 


Pr := 


1 


-Xx{BtK ( Pb' t + KP) < V. 


T + 


(83) 


Since the operational capacity definitions in [6] and [11] coincide, we have Ct,cp(P) = Ct{V). This 
may also be seen by observing that, any channel input {79ji can be rewritten in the form of d80l . but 
since is sufficient to achieve Ct, we conclude that 4Hm» is also sufficient to achieve Ct- 

C. CP structure for ISI Gaussian channel 

By using the equivalence between the colored Gaussian noise channel and the ISI channel T, we can 
derive the CP coding structure for T, which is obtained from (I80t by introducing a new quantity r 7 as 

r T := (I + B T )~ l v T . (84) 

By Z J = ZtN t and y T = ZTy 1 we have 

u T = B t Z t N t + (/ + B T )r T 

y T = Zp(I + B t )Z t N t + Zp(I + B T )r T (85) 

= Zf\l + B T ){Z T N T + r T ). 

This implies that, the channel input u T can be represented as 

u T = (/ + B T )~ 1 B T Z T y T + r T , (86) 










channel T 



Fig. 11. The block diagram of the CP structure for ISI Gaussian channel T. 


which leads to the block diagram in Fig. II II 
The capacity Ct now takes the form 

C T {V) = sup + log det K<p 

= sup 2(T 1 + log det (z^Hl + B T ){Z T Z' T + KP)(I + B T )'Zp) (87) 

= sup + t log det {Z t Z' t + kP) 

where the supremum is over the power constraint 

Pt ■= ^-ytr (B t Z t Z' t B' t + (F + B T )K ( p(I + B T )') < V. (88) 

It is easily seen that the capacity in this form is identical to ( 182b . 


D. Relation between the CP structure for ISI Gaussian channel and the general coding structure 


We can establish correspondence relationship between the CP structure for ISI Gaussian channel T 
in Fig. M and the general coding structure for T in Fig. |2j In fact, the general coding structure for T 

in Fig. |2] was initially motivated by the CP structure for channel T in Fig. QT] 

(T) 

For any fixed (FQ , Bt) in the CP structure, define in the general coding structure that 


Qt 

A 

C 


(F + Bt) v BtZt 


’ 0 

It 

* 

* 

0 • 

• O" 




(89) 


where To := (kP) 2 , and * can be any number. (Note that the case kP > 0 but I\P is not positive 

(T) 

definite can be approached by a sequence of positive definite FQ , and thus it is sufficient to consider 
only positive definite K r in establishing the coiTespondence relation of the two structures.) Then it is 
easily verified that Qt is strictly lower triangular, (A, C) is observable with a nonsingular observability 
matrix T = Fq, and A can have eigenvalues not on the the unit circle and not at the locations of 
F ’s eigenvalues. Therefore, for any given ( K r \Bt), we can find an admissible (A,C,Qt), and it is 
straightforward to verify that they generate identical channel inputs u T . 

Conversely, for any fixed admissible ( A,C,Qt ) with <G M( n+1 ) x ( n+1 ), we can obtain an admissible 
(. kP,B t ) as 


B t := Q T Zp(I -Q T Zp)~ l (90) 

KP := V(A,C)T(A,Cy, 

which generates identical channel input u T as (A, C. Qt) does. 


7 This v T is called innovations in [12], [35]; it should not be confused with the Kalman filter innovations in this paper. 

























As a result of the above reasoning, there is a corresponding relation between the CP structure for T 

(T) 

and the general coding structure, and the maximum rate over all admissible {K r ,Bt) (namely Ct) 
equals that over all admissible ( A,C,Qt )• In other words, we have 

Lemma 1. 

C t (V) = C t ,t(V). (91) 

Proof: Note that Ct,t is the maximum rate over all admissible (A. C. Qr) with <G r( t + 1 )x( t + 1 ). 

□ 

This lemma implies that the general coding structure with an extra constraint T = n becomes the CP 
structure, that is, in the CP structure, the dimension of A is equal to the horizon length. One advantage 
of considering the general coding structure is that we can allow T f n, which makes it possible to 
increase the horizon length to infinity without increasing the dimension of A, a crucial step towards the 
Kalman filtering characterization of the feedback communication problem. 

Our study on the general coding structure also refines the CP structure. We can now identify more 
specific structure of the optimal (K{, . Bt). Indeed, we conclude that the CP structure needs to have a 
Kalman filter inside. We may further determine the optimal form of Bt■ From and EJ. we have 
that 

B* t = -Q* T {A,C)Zf l . (92) 

(T) 

Therefore, to achieve Ct in the CP structure, it is sufficient to search (K v , Bt) in the form of 

kP : = (i-g* T (A,c)Zf 1 )r(A,c)T(A,c)'(i-gf(A,c)Zf 1 y 
B* t := -gf(A,C)Zf\ 

Additionally, as T tends to infinity, it can be easily shown that {ry} is a stable process in order to 
achieve Coo. 

E. Proof of Proposition^ Necessary condition for optimality 

In this subsection, we show that our general coding structure, in the form of (Eli, satisfies the necessary 
condition for optimality as presented in Proposition |4] 

Since {y t } is interchangeable with the innovations process {e/}, in the sense that they determine each 
other causally and linearly, it suffices to show that Erqe T = 0. Note that 

ut = PX^ = OAXt_i — JMjt-iet-i, (94) 

and thus 

BiUtet-i = EBAX t _iet_i — BL t _iAT ejt _i 

( = EBAXt-iX'^C + EPAXj-iiVt-i - BAE t _iC' ( 95 ) 

= BAX^-iCy TO — OAS^—iCy = 0, 

where (a) follows from (1421 and (143k Similarly we can prove Etqe T = 0 for any r < t — 1. 

Appendix III 

Infinite-horizon: The properties of the general coding structure 
A. Proof of Proposition 0 Convergence to steady-state 

In this subsection, we show that system EJ converges to a steady-state, as given by (BTT i. To this 
aim, we first transform the Riccati recursion into a new coordinate system, then show that it converges 
to a limit, and finally prove that the limit is the unique stabilizing solution of the Riccati equation. The 
convergence to the steady-state follows immediately from the convergence of the Riccati recursion. 


Consider a coordinate transformation given as 


A := 4>A<I> 1 := 


A 0 
0 F 


C := C<f> , 


where 


4> ;= 


7n+l 0 

(A -I 77 

and (j) is the unique solution to the Sylvester equation 

F<t>-(i>A = -GC 


(96) 

(97) 

(98) 


Note that the existence and uniqueness of </> is guaranteed by the assumption on A that Aj(— A)+Xj(F) / 
0 for any i and j (see Section HV- At . 

This transformation transforms A into block-diagonal form with the unstable and stable eigenvalues 
in different blocks, and transforms the initial condition Sq to 


S 0 :=$ 


(99) 


( 100 ) 


In +1 0 _ I 4> 

0 oj [~<f> oo' 

Therefore, the convergence of (144b with initial condition So is equivalent to the convergence of 

E, +1= AE,A'- #g£M: 

-t+1 - 1- CS 4 C' + 1 

with initial condition S 0 . By [34], S t would converge if 

det ([" £]-s.[V L})'* (101) 

where X 22 is a positive semi-definite matrix (whose value does not affect our result here). Since 
det 


( 

'0 O' 


- i -<f 


0 

j = det f 

4>'x 2 2 

V 

0 / 


-($> (jxjj _ 

0 

9622. 

J V 

. <t> I-WX 2 2. 


= det(-J) det (I - WX 22 + 0 X 22 ) (1 ° 2) 

+ 0 , 

we conclude that S t converges to a limit 

This limit S^ is a positive semi-definite solution to 

E =A S ( 103) 

_oo -oo- CS^C' + l ■ 

By [31], (1 1 03b has a unique stabilizing solution because (A, C) is observable and A does not have any 
eigenvalues on the unit circle. Therefore, S^ is this unique stabilizing solution, which can be computed 
from (fTT)3l as (see also [34]) 


I! = 


S n 0 
0 0 


where S u is the positive-definite solution to a reduced order Riccati equation 


E n =4S, ,4'- 


ii J 


AV n (C + H4>)'(C + H<t>)X u A' 
(C + H</>)V n (C + H</>)'+ 1 ' 

and has rank (n + 1) (cf. [34]). Thus, A/ converges to 


Sqo — 


£11 

4>^u4>' 


(104) 


(105) 


(106) 


with rank (n + 1). 



































B. Infinite-horizon feedback capacities 

If the noise in the colored Gaussian channel forms a (an asymptotic) stationary process, then Ct(V) 
has a finite limit (cf. [15]; the proof utilizes the superadditivity of Ct, similar to the case of forward 
communication capacities studied in [36]), which also has the operational and information meanings. 
Therefore, we have 

lim Ct = Coo < oo, (107) 

T—> oo 

where Coo is the operational or information infinite-horizon capacity (cf. [6], [11]). 

By Lemma El the above implies that 

lim C't.t = Coo ■ (108) 

T—>oo 

Note that this does not simply lead to that lim n _ >00 limr-voo Cr, n = Coo or Coo = C s , since we could 
not show that the involved limits (including taking the supremum) are interchangeable in this case. 


Appendix IV 

Proof of Proposition 13 K £ = 0 

In this section, we prove that Kg has to be 0 to ensure the optimality in (E3 i. 

We first derive some properties of the communication system using the stationary GM inputs and the 
steady-state Kalman filtering. The system dynamics is given by 


Ut 

= d's Sj t + £t 

St+1 

= F St + Gilt 

yt 

= H St + Nt + Ut 

S s ,t+1 

= St — S s ,t 

S s ,t+1 

— Fs s t T L s e.t 

e t 

= y t ~ Hs St t = (H + 

Ss,t+1 

— F Sgj T Gut L s e t 


(109) 


where s Sj o = 0 and s Sj o = 0. As before, the Kalman filter innovations {e t } will play an important role. 
The innovations process is white with variance asymptotically equal to 


K e = 1 + K e + {H + d’)Ti s {H + dj, 


(HO) 


where X s := Es s s'. Following the same derivation for Proposition [6] we know that the asymptotic 
information rate is given by 

I{£-y) = UogK e , (111) 

which is consistent with the result in [12]. 

We now invoke the equivalence between the colored Gaussian channel and the ISI channel T, that 
is, instead of generating y by (fl09l . we generate y by 


yt 

= Ut~\~ Zt 

s c,t +1 

= Fs C) t + Gyt 

yt 

= Hs Ci t T jjti 


(112) 




where s c ,o = 0. Since Z T = ZtN t , the mapping from (u, N ) to y here is equivalent to that in < 1 1 09t i. 
Therefore, (fl09l becomes 




ut = d's S:t + St 

yt = u t + z t 
&c,t +1 — S'sc t T Gyt 
yt = Hs Cjt + yt 

®s,t+i — S'sg t 4~ L s et 

e t = Vt — H$s,t = (H + d')s S) t + £t + Nt 

®s,t+i — T Gut Ls&ti 


where s Sj o = 0; see Fig. El for the block diagram. 


(113) 



Fig. 12. Block diagram for the communication system using the GM inputs and Kalman filtering, where s C} t is the state for 
Z~ x with s Cj o = 0, and s S: t is the state for system (F,L g ,H, 0) with s Sj q = 0. 


Our analysis of this system is facilitated by considering transfer functions. Note that 


Tsu = § 
T Nu = TZ, 


(114) 


where § is the sensitivity, and T := § — 1 is the complimentary sensitivity. (The sensitivity § here should 
not be confused with the sensitivity in Section IV-AI ) Then we have 


u = SS + TZN 
y = S (S + ZN). 


(115) 


Now assume that d and Kg form the optimal solution to O. where Kg / 0, for contradiction 
purpose. We can then compute the corresponding optimal £ s , L s , S, T, etc. Fix the optimal L s , §, 
and T. We will show that this leads to: 1) The whiteness of {y/}; 2) L s = G\ 3) Kg = 0 and hence 
contradiction. 

1) For fixed optimal values of L s , §, and T, suppose that we can have the freedom of choosing the 
power spectrum of £ in (frm Since we have assumed the optimality of a white process {£t}, it must 
hold that any correlated process {£ c ,t} does not lead to a larger mutual information than {<£)} does. 
Precisely, assume a stationary correlated process {£ c ,t} replaces the white process {£ t } in ( II 13k Then 
{£t} yields the maximum achievable rate over all possible {£ c ,t}, i c., it solves 


max I{£ c \y )■ 

L a ,S,T fixed, S £c {e^ e ) (116) 

s.t. ’Eu 2 <V 


Since 


I{£ c] y) = h(y ) - h(y\£ c ) = h(y) - h(§ZN) 


( 117 ) 


































and h(SZN) is fixed for fixed S, the above optimization is equivalent to 

max - [ log Sy(e J ‘ 27T °)d9. 

S ec (e^°) 2 7_i VK ’ 

s.t. E u 2 =f\ S s ^ e )S £c (e j2 ^ e )+St{e j2 ^ e )S z (e j27re )de<V 
' ~7 


( 118 ) 


However, this optimization problem is equivalent to solving, for some V\ > 0, 

max, i [' loglSsie^^SsA^+Ssie^SsW^dl), 

S £c (e’ M )2j_i v j (1 19 ) 

s.t. f\ Ss( e ^ s )S £c (e^ e )d0<P 1 

which we identify as a new forward communication problem, see Fig. ED In this problem, we want 
to tune the power spectrum of §£,,, the effective channel input, to get the maximum rate. The optimal 
solution is given by waterfilling, namely, the power spectrum S§ (e J27r0 )Ss c (e j27r0 ) needs to waterfill the 
power spectrum ( G~ ir0 )Sz{eJ 2nl> ). By optimality of {£*}, K is the waterfilling solution. 


§ZN 


S£c 


& 


y 


Fig. 13. An equivalent forward communication channel. Here S£ c is the effective input, SZN is the effective channel noise, 
and y is the output. 


Since Sr(e J ‘ 27V °) = 0 for some 6 if and only if S(z) has a zero for that 6 on the unit circle, and since 
§(z) is a finite dimension transfer function with a finite number of zeros, the power spectrum 5s(e- ?27r61 ) 
cannot have zero amplitude at any interval. This follows that the support of the channel input spectrum 

K £ S s (e^ e ) is [-1/2,1/2], 

In waterfilling, if the support of input spectrum is [—1/2,1/2], then the output spectrum must be flat. 
This is easily proven by contradiction. Thus, {y t } is a white process. Let us assume that its variance is 
n 2 . 

2) Note that both y and e have white spectrum, which imposes condition on the choice of L s . The 
transfer function T ye is illustrated in Fig. El where we can see that its structure is a Kalman filter 
structure. To make e white, it is necessary to choose L s to be the Kalman filter gain (cf. [31]), given by 


FT, C H' + a 2 G 

HZcH' + a 2 ’ 


( 120 ) 


where X c is the estimation error covariance matrix and is a nonnegative solution to the Riccati equation 


£ c = FZ C F' + o 2 GG’ 


(FT, C H' + <j 2 G){FZ c H' + a 2 G)' 
HT, C H' + a 2 


( 121 ) 


Clearly, S c = 0 is a solution to the Riccati equation. By [31], it is also the unique nonnegative solution. 
Hence, we need to choose L s := G. 

3) The fact that L s = G leads to reduction of system < rmt or equivalently We have 


= (F-GH)s t -GN t 
= (F-GH)Z S (F-GH)'+ GG'. 


S*+l 

S, 


( 122 ) 







Fig. 14. The state-space representation of the transfer function T ye . 


In the case that (F — GH) is unstable, the closed-loop of ( II131 ) is unstable and cannot transmit 
information. In the case that (F — GH) is stable, the steady-state of X s depends only on (F. G, H ) and 
is independent of the choice of d and Kg , and thus < Rn?t becomes 


Coo = max - log(l + Ks + (H + d')T, s (H + d’)'). 

£„ fixed,deR m ,/\ £ eK2 
s.t. V=d'T, s d-\-K£ 


(123) 


This is equivalent to 


which requires Ks = 0. 


max HT^gd, 
d£M. m ,K e £S. 
s.t. d'Yi s d<V—K£ 


(124) 


Appendix V 

Optimality of the proposed coding scheme 
A. Proof of Proposition |5J Finite dimensionality of the optimal scheme 

i) To show that Coo,n is non-decreasing as n increases, note that, an encoder (A, C) of dimension 
(n + 1) can be arbitrarily approximated by a sequence of encoders {(A,, C,)} of dimension (n + 2) in 
the form of 


A 

0 

0 

1 


C 


(125) 


and therefore the supremum in m with encoder dimension in + 2) is no smaller than the supreme 
with encoder dimension (n + 1). So Coo,n is increasing in n. 

ii) By proposition[6]and the definition for Coo,m- i(F), the optimization problem for solving C' 00jm _i(F) 
is given by 


C 00 ,m-l{'P) ~ 


1 

sup - 

A£R™x-»,c 2 

s.t.£=A£A'-A£C'(C£C'+l)- 1 C£A' 


log(CSC' + 1) 


(126) 


To compare it with C'oo('P), we rewrite (1621 ) and (1631 ) in another form, incorporating Ks = 0. Define 

A := 


E : = 


F + Gd' 

0 

Gd! 

F 

i H} 


i 0] 







_ 
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It is then straightforward to verify that 

^log(l + (H + d')Z s (H + d')') 

d'E s d 

ASA' - ASC'(CSC' + 1) -1 CSA' 

which yields that 

C'oo('P) = sup 


il 0 g(l + CEC') 
Dsro' 

S, 


ilog(l + CSC') 


s.t.£=AEA'-AEC'(CEC'+l)- 1 CEA' 


(129) 


Comparing (fl29t with (^^26j, we conclude that Coo,m —i(7^) 9^ Hoo(V')- However, since for each (A, 12), 
the channel input sequence is stationary by the steady-state characterization of the general coding 
structure, it holds that C' 00im _i(P) < C' 00 (A’). Therefore, we have 


C 00 ,m-l{V) = C 00 {V). 


(130) 


Then ii) follows from i) immediately. 

B. Proof of Proposition® Achieving CA in the information sense 

By Proposition |8j the optimization problem for solving / A (72) in © (which is equivalent to solving 
CUV)) can be reformulated as 

\A opt ,C opt ,D opt ] ■= arg inf DSHD', 

J 4 eR mxm ,C 

sAS=A£A'-A£C'(CEC'+l)- 1 C£A' (131) 

log DI(A)=H 


for any desired rate 72. Without loss of generality, we may assume that (A, C ) is in the observable 
canonical form, i.e. 


A := 
C := 


Onx 1 

In 

tin 

Cln—1 * * * 


(132) 


1 0 


lxn 


Observe that det A = a n . Thus, 797(A) = | det A| = \a n \ if A does not contain stable eigenvalues, and 
797(A) > | det A| = \a n \ otherwise. 

As a consequence, if we search over A with a n fixed to be 2 R or -2* we actually enforce 797(A) > 
2 r . However, the optimal solution must satisfy DI(A opt ) = 2 R , since otherwise the system has a 
rate equal to Tioo^.i = log DI(A opt ) > 72, which would require more power than the case that 
7?oo,m—l = 72; notice that ED is a power minimization problem. To summarize, we can remove the 
constraint log 797(A) = 72 by letting a n = ±2 r ' in (1 1321) . and the optimal solution A does not contain 
stable eigenvalues. Furthermore, note that unit-circle eigenvalues do not generate any rate or power and 
hence can be removed. Thus, if A opt has (n* + 1) unstable eigenvalues, we can solve the optimization 
problem with A having size ( n* + 1) and the obtained optimal solution still achieves CA- 


C. Proof of Proposition 1/01 Optimality in the analog transmission 
The end-to-end distortion is given by 

MSE(Wi) = E (W -Wt)(W -w t y 
= E(x 0 - xo,t)(xo ~ xo,t)' 

= E(A- t ~ 1 x t+ i - A'*- 1 A + i)(A- t “ 1 xt + i - A~ t - 1 x t+1 )' 
= E A-^xt+xx^A^- 1 


( 133 ) 







where 


E x ,t+i [1, 0]Et + i [I, O] 7 (134) 

and the expectation is w.r.t. the randomness in W and W t . By rate-distortion theory, the above distortion 
needs an asymptotic rate R satisfying 

1 p«*+i 

R > lim —-- log--— 

t^oc 2(i + 1) b det MSE{W t ) 

1 . det A 2t+2 (135) 

= lim —-- log -——- 

t^oo 2 (t + 1) det S x ,t+t 

= log | det A\. 

From Proposition [d] we know that log | det A* | equals and the average channel input power equals 
V. Because C 00 is the supremum of asymptotic rate, it follows that the equality in (11351) is achieved. 
Then we see that the proposition holds. 

D. Proof of Proposition \m Optimality in digital transmission 

It is sufficient to show that R^^A^C) is achievable for any fixed (A,C). To show this, for the 
fixed (A,C), construct the scheme in Fig. |3 and use Qf, the Kalman-filter based optimal receiver. The 
closed-loop <m> is stabilized and will converge to its steady-state for large enough T. 

We can then directly verify that Theorems 4.3 and 4.6 in [14] are applicable to the (steady-state) 
LTI system. These theorems assert that, if the closed-loop system is stabilized, then we can construct a 
sequence of codes to reliably (in the sense of vanishing probability of error) transmit the initial conditions 
associated with the open-loop unstable eigenvalues of A (denoted clq, ■ ■ ■, a^, if any), at a rate 


R ■— (1 — e)R OC}n (A, C ) 


(136) 


for any e > 0, and in the meantime, Poo^A^C) < V holds. Therefore, we conclude that, for any 
(. A , C), the portion of W that is associated with the unstable eigenvalues of A is transmitted reliably 
from the transmitter to the receiver at rate arbitrarily close to Roc. n (A, C). Moreover, we notice that we 
can achieve Coo,™ by a sequence of purely unstable ( A , C) (i.e. k = n), in which the initial condition 
W is the message being transmitted. This follows that W is transmitted at the capacity rate. 

In addition, [14] showed that for any choice of xq, it holds that 


PE t = 1 - ft ( 1 - 2Q 


i =0 


where is the square root of the /th eigenvalue of MSE(.x'o 



(137) 


MSE(T 0 ,t) = E(x 0 - x 0 ,t)(x 0 - xo,tY 

= A~ t ~ 1 Yi t . r^A'~ T ~ l . 


(138) 

Note that the expectation is w.r.t. the randomness in x'o.r only, different from <11331 . and that asymptot¬ 
ically T, t+ i and hence ’A x .t+ i are independent on the choice of xq. 

It then holds for each i, 

(CTT,*) 2 < A max (MSE(.T 0 ,T)) 

= A max (xl~ T ~ 1 £ X] 'r+i^4'~ T-1 ) 

Amax {A'- T - 1 A~ t ~ 1 T, X: t+i ) 

g(A'-t-'A-t-'E x x +1 ) 


(a) 

(b) 

< 


( 139 ) 







where A max (M) denotes the maximum eigenvalue of M, a(M') denotes the maximum singular value of 
M, (a) follows from \(AB) = A (BA), (b) follows from j A(A)| < g(A), and (c) is because the maximum 
singular value is an induced norm. Since E^t+i converges to steady-state value exponentially, the above 
implies that, for T large enough, each ap,i decays to zero exponentially as T increases. 

Now using the union bound and the Chernoff bound, we have 

n 

PE t < ^2 2Q 

i= 0 
n 

- E71 

i=0 v 

and hence PEt decreases to zero doubly exponentially since e > 0 and ap,i decays exponentially. Thus 
we prove the proposition. 
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